Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 95.383
Filtrar
1.
Methods Mol Biol ; 2787: 107-122, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38656485

RESUMEN

Genetic diversity refers to the variety of genetic traits within a population or a species. It is an essential aspect of both plant ecology and plant breeding because it contributes to the adaptability, survival, and resilience of populations in changing environments. This chapter outlines a pipeline for estimating genetic diversity statistics from reduced representation or whole genome sequencing data. The pipeline involves obtaining DNA sequence reads, mapping the corresponding reads to a reference genome, calling variants from the alignments, and generating an unbiased estimation of nucleotide diversity and divergence between populations. The pipeline is suitable for single-end Illumina reads and can be adjusted for paired-end reads. The resulting pipeline provides a comprehensive approach for aligning and analyzing sequencing data to estimate genetic diversity.


Asunto(s)
Variación Genética , Genoma de Planta , Plantas , Plantas/genética , Programas Informáticos , Análisis de Secuencia de ADN/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Biología Computacional/métodos , Genómica/métodos
2.
Methods Mol Biol ; 2787: 225-243, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38656493

RESUMEN

Coffee, an important agricultural product for tropical producing countries, is facing challenges due to climate change, including periods of drought, irregular rain distribution, and high temperatures. These changes result in plant water stress, leading to significant losses in coffee productivity and quality. Understanding the processes that affect coffee flowering is crucial for improving productivity and quality. In this chapter, we describe a protocol for transcriptome analysis using available Internet software, mainly in the Galaxy Platform, using RNA-Seq data from flowers collected from different parts of the coffee tree. The methods presented in this chapter provide a comprehensive protocol for transcriptome analysis of differentially expressed genes from flowers of coffee plant. This knowledge can be utilized in coffee genetic improvement programs, particularly in the selection of cultivars that are tolerant to water deficit.


Asunto(s)
Coffea , Flores , Perfilación de la Expresión Génica , Regulación de la Expresión Génica de las Plantas , Transcriptoma , Flores/genética , Coffea/genética , Perfilación de la Expresión Génica/métodos , Transcriptoma/genética , Programas Informáticos , Biología Computacional/métodos , RNA-Seq/métodos
3.
Methods Mol Biol ; 2788: 157-169, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38656513

RESUMEN

This chapter presents a comprehensive approach to predict novel miRNAs encoded by plant viruses and identify their target plant genes, through integration of various ab initio computational approaches. The predictive process begins with the analysis of plant viral sequences using the VMir Analyzer software. VMir Viewer software is then used to extract primary hairpins from these sequences. To distinguish real miRNA precursors from pseudo miRNA precursors, MiPred web-based software is employed. Verified real pre-miRNA sequences with a minimum free energy of < -20 Kcal/mol, are further analyzed using the RNAshapes software. Validation of predictions involves comparing them with available Expressed Sequence Tags (ESTs) from the relevant plant using BlastN. Short sequences with lengths ranging from 19 to 25 nucleotides and exhibiting <5 mismatches are prioritized for miRNA prediction. The precise locations of these short sequences within pre-miRNA structures generated using RNAshapes are meticulously identified, with a focus on those situated on the 5' and 3' arms of the structures, indicating potential miRNAs. Sequences within the arms of pre-miRNA structures are used to predict target sites within the ESTs of the specific plant, facilitated by psRNA Target software, revealing genes with potential regulatory roles in the plant. To confirm the outcome of target prediction, results are individually submitted to the RNAhybrid web-based software. For practical demonstration, this approach is applied to analyze African cassava mosaic virus (ACMV) and East African cassava mosaic virus-Uganda (EACMV-UG) viruses, as well as the ESTs of Jatropha and cassava.


Asunto(s)
Biología Computacional , MicroARNs , Virus de Plantas , ARN Viral , Programas Informáticos , MicroARNs/genética , Virus de Plantas/genética , Biología Computacional/métodos , ARN Viral/genética , Genes de Plantas , Conformación de Ácido Nucleico , Plantas/virología , Plantas/genética , Etiquetas de Secuencia Expresada
4.
Methods Mol Biol ; 2788: 139-155, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38656512

RESUMEN

This computational protocol describes how to use pyPGCF, a python software package that runs in the linux environment, in order to analyze bacterial genomes and perform: (i) phylogenomic analysis, (ii) species demarcation, (iii) identification of the core proteins of a bacterial genus and its individual species, (iv) identification of species-specific fingerprint proteins that are found in all strains of a species and, at the same time, are absent from all other species of the genus, (v) functional annotation of the core and fingerprint proteins with eggNOG, and (vi) identification of secondary metabolite biosynthetic gene clusters (smBGCs) with antiSMASH. This software has already been implemented to analyze bacterial genera and species that are important for plants (e.g., Pseudomonas, Bacillus, Streptomyces). In addition, we provide a test dataset and example commands showing how to analyze 165 genomes from 55 species of the genus Bacillus. The main advantages of pyPGCF are that: (i) it uses adjustable orthology cut-offs, (ii) it identifies species-specific fingerprints, and (iii) its computational cost scales linearly with the number of genomes being analyzed. Therefore, pyPGCF is able to deal with a very large number of bacterial genomes, in reasonable timescales, using widely available levels of computing power.


Asunto(s)
Genoma Bacteriano , Filogenia , Plantas , Programas Informáticos , Plantas/genética , Plantas/microbiología , Proteínas Bacterianas/genética , Genómica/métodos , Biología Computacional/métodos , Bacterias/genética , Bacterias/clasificación , Familia de Multigenes , Especificidad de la Especie
5.
Methods Mol Biol ; 2788: 171-193, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38656514

RESUMEN

Plants produce diverse specialized metabolites (SMs) that do not participate in plant growth and development but help them adapt to various environmental conditions. In addition to aiding in plant adaptation, different SMs serve as active ingredients for pharmaceutical and cosmetics products. However, despite their significant role in plant adaptation and industrial importance, the genes involved in the biosynthesis and regulation of many SMs remain largely unknown. This hinders deciphering the specific role of SMs in plant adaptation and limits their industrial utilization. Since many SMs pathway genes are expected to act in tight association with each other within a coexpression network, the network biology approach, such as weighted gene coexpression network analysis, could be used to identify the unknown genes. This chapter describes a workflow for constructing a gene coexpression network to identify genes that could be associated with the biosynthesis and regulation of SMs.


Asunto(s)
Regulación de la Expresión Génica de las Plantas , Redes Reguladoras de Genes , Plantas , Metabolismo Secundario , Metabolismo Secundario/genética , Plantas/genética , Plantas/metabolismo , Perfilación de la Expresión Génica/métodos , Biología Computacional/métodos , Genes de Plantas
6.
Methods Mol Biol ; 2788: 97-136, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38656511

RESUMEN

Plant specialized metabolites have diversified vastly over the course of plant evolution, and they are considered key players in complex interactions between plants and their environment. The chemical diversity of these metabolites has been widely explored and utilized in agriculture and crop enhancement, the food industry, and drug development, among other areas. However, the immensity of the plant metabolome can make its exploration challenging. Here we describe a protocol for exploring plant specialized metabolites that combines high-resolution mass spectrometry and computational metabolomics strategies, including molecular networking, identification of structural motifs, as well as prediction of chemical structures and metabolite classes.


Asunto(s)
Espectrometría de Masas , Metaboloma , Metabolómica , Plantas , Metabolómica/métodos , Plantas/metabolismo , Espectrometría de Masas/métodos , Biología Computacional/métodos
7.
Zhonghua Yi Xue Za Zhi ; 104(16): 1410-1417, 2024 Apr 23.
Artículo en Chino | MEDLINE | ID: mdl-38644292

RESUMEN

Objective: To investigate the genetic and expression characteristics of transcription factor IIH (TFIIH) in pre-initiationcomplex in prostate cancer (PCa) and its relationship with prostate cancer progression. Methods: Analyzing the expression characteristics and clinical signification of TFIIH subunits about 495 cases of PCa and 52 cases of adjacent cancer in The Cancer Genome Atlas-Prostate adenocarcinoma (TCGA-PRAD) database. PCa microarray chip was used to verify the correlation between the key factor General Transcription Factor IIH Subunit 4 (GTF2H4) in TFIIH and clinical features. Results: The 495 patients with PCa were (61.01±6.82) years old.The mRNA expression of ERCC3、GTF2H4 and MNAT1 were high in PCa tissues with GS≥8(P<0.05). The expression of GTF2H4 and MNAT1 were relevant to the pathological stages(P<0.05). High expression of GTF2H4 has higher biochemical recurrence (BCR) rate in PCa patients(HR=2.47, 95%CI:1.62-3.77, P<0.001), which has better predictive effect of BCR in PCa patients(The 3rd, 5th, and 7th year AUC all>0.7) than other subunits, and it has been verified in four additional databases. Single-factor Cox regression analysis showed that GTF2H4 were risk factors for BCR (HR=2.470, 95%CI:1.620-3.767, P<0.001) and GTF2H5 were protective factors(HR=0.506,95%CI: 0.336-0.762, P=0.001). The results of immunohistochemical staining showed that the protein expression of GTF2H4 was correlated with the clinical features of PCa patients.The differences of the above results were statistically significant. Conclusion: GTF2H4, the key factor of TFIIH, is highly expressed in PCa and indicates a poor prognosis.


Asunto(s)
Biología Computacional , Neoplasias de la Próstata , Humanos , Masculino , Neoplasias de la Próstata/metabolismo , Neoplasias de la Próstata/patología , Neoplasias de la Próstata/genética , Pronóstico , Persona de Mediana Edad , Proteínas de Unión al ADN/metabolismo , Proteínas de Unión al ADN/genética , ADN Helicasas/metabolismo , ADN Helicasas/genética , Anciano , Factores de Transcripción TFII/metabolismo , Factores de Transcripción TFII/genética
8.
Sci Rep ; 14(1): 9155, 2024 04 21.
Artículo en Inglés | MEDLINE | ID: mdl-38644393

RESUMEN

Deep learning models (DLMs) have gained importance in predicting, detecting, translating, and classifying a diversity of inputs. In bioinformatics, DLMs have been used to predict protein structures, transcription factor-binding sites, and promoters. In this work, we propose a hybrid model to identify transcription factors (TFs) among prokaryotic and eukaryotic protein sequences, named Deep Regulation (DeepReg) model. Two architectures were used in the DL model: a convolutional neural network (CNN), and a bidirectional long-short-term memory (BiLSTM). DeepReg reached a precision of 0.99, a recall of 0.97, and an F1-score of 0.98. The quality of our predictions, the bias-variance trade-off approach, and the characterization of new TF predictions were evaluated and compared against those produced by DeepTFactor, as well as against experimental data from three model organisms. Predictions based on our DLM tended to exhibit less variance and bias than those from DeepTFactor, thus increasing reliability and decreasing overfitting.


Asunto(s)
Aprendizaje Profundo , Factores de Transcripción , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , Biología Computacional/métodos , Células Procariotas/metabolismo , Redes Neurales de la Computación , Eucariontes/genética , Genoma , Células Eucariotas/metabolismo , Sitios de Unión
9.
BMC Bioinformatics ; 25(1): 164, 2024 Apr 25.
Artículo en Inglés | MEDLINE | ID: mdl-38664601

RESUMEN

Multimodal integration combines information from different sources or modalities to gain a more comprehensive understanding of a phenomenon. The challenges in multi-omics data analysis lie in the complexity, high dimensionality, and heterogeneity of the data, which demands sophisticated computational tools and visualization methods for proper interpretation and visualization of multi-omics data. In this paper, we propose a novel method, termed Orthogonal Multimodality Integration and Clustering (OMIC), for analyzing CITE-seq. Our approach enables researchers to integrate multiple sources of information while accounting for the dependence among them. We demonstrate the effectiveness of our approach using CITE-seq data sets for cell clustering. Our results show that our approach outperforms existing methods in terms of accuracy, computational efficiency, and interpretability. We conclude that our proposed OMIC method provides a powerful tool for multimodal data analysis that greatly improves the feasibility and reliability of integrated data.


Asunto(s)
Análisis de la Célula Individual , Análisis por Conglomerados , Análisis de la Célula Individual/métodos , Biología Computacional/métodos , Humanos , Algoritmos
10.
BMC Bioinformatics ; 25(1): 165, 2024 Apr 25.
Artículo en Inglés | MEDLINE | ID: mdl-38664627

RESUMEN

BACKGROUND: The annotation of protein sequences in public databases has long posed a challenge in molecular biology. This issue is particularly acute for viral proteins, which demonstrate limited homology to known proteins when using alignment, k-mer, or profile-based homology search approaches. A novel methodology employing Large Language Models (LLMs) addresses this methodological challenge by annotating protein sequences based on embeddings. RESULTS: Central to our contribution is the soft alignment algorithm, drawing from traditional protein alignment but leveraging embedding similarity at the amino acid level to bypass the need for conventional scoring matrices. This method not only surpasses pooled embedding-based models in efficiency but also in interpretability, enabling users to easily trace homologous amino acids and delve deeper into the alignments. Far from being a black box, our approach provides transparent, BLAST-like alignment visualizations, combining traditional biological research with AI advancements to elevate protein annotation through embedding-based analysis while ensuring interpretability. Tests using the Virus Orthologous Groups and ViralZone protein databases indicated that the novel soft alignment approach recognized and annotated sequences that both blastp and pooling-based methods, which are commonly used for sequence annotation, failed to detect. CONCLUSION: The embeddings approach shows the great potential of LLMs for enhancing protein sequence annotation, especially in viral genomics. These findings present a promising avenue for more efficient and accurate protein function inference in molecular biology.


Asunto(s)
Algoritmos , Anotación de Secuencia Molecular , Alineación de Secuencia , Anotación de Secuencia Molecular/métodos , Alineación de Secuencia/métodos , Proteínas Virales/genética , Proteínas Virales/química , Genes Virales , Bases de Datos de Proteínas , Biología Computacional/métodos , Secuencia de Aminoácidos
11.
BMC Musculoskelet Disord ; 25(1): 291, 2024 Apr 15.
Artículo en Inglés | MEDLINE | ID: mdl-38622662

RESUMEN

OBJECTIVES: The aim of this study was to explore the long non-coding RNA (lncRNA) expression profiles in serum of patients with ankylosing spondylitis (AS). The role of these lncRNAs in this complex autoimmune situation needs to be evaluated. METHODS: We used high-throughput whole-transcriptome sequencing to generate sequencing data from three patients with AS and three normal controls (NC). Then, we performed bioinformatics analyses to identify the functional and biological processes associated with differentially expressed lncRNAs (DElncRNAs). We confirmed the validity of our RNA-seq data by assessing the expression of eight lncRNAs via quantitative reverse transcription polymerase chain reaction (qRT-PCR) in 20 AS and 20 NC samples. We measured the correlation between the expression levels of lncRNAs and patient clinical index values using the Spearman correlation test. RESULTS: We identified 72 significantly upregulated and 73 significantly downregulated lncRNAs in AS patients compared to NC. qRT-PCR was performed to validate the expression of selected DElncRNAs; the results demonstrated that the expression levels of MALAT1:24, NBR2:9, lnc-DLK1-35:13, lnc-LARP1-1:1, lnc-AIPL1-1:7, and lnc-SLC12A7-1:16 were consistent with the sequencing analysis results. Enrichment analysis showed that DElncRNAs mainly participated in the immune and inflammatory responses pathways, such as regulation of protein ubiquitination, major histocompatibility complex class I-mediated antigen processing and presentation, MAPkinase activation, and interleukin-17 signaling pathways. In addition, a competing endogenous RNA network was constructed to determine the interaction among the lncRNAs, microRNAs, and mRNAs based on the confirmed lncRNAs (MALAT1:24 and NBR2:9). We further found the expression of MALAT1:24 and NBR2:9 to be positively correlated with disease severity. CONCLUSION: Taken together, our study presents a comprehensive overview of lncRNAs in the serum of AS patients, thereby contributing novel perspectives on the underlying pathogenic mechanisms of this condition. In addition, our study predicted MALAT1 has the potential to be deeply involved in the pathogenesis of AS.


Asunto(s)
MicroARNs , ARN Largo no Codificante , Espondilitis Anquilosante , Humanos , ARN Largo no Codificante/genética , Perfilación de la Expresión Génica/métodos , Espondilitis Anquilosante/genética , MicroARNs/metabolismo , Biología Computacional/métodos , Redes Reguladoras de Genes , Proteínas Adaptadoras Transductoras de Señales/genética , 60528
12.
Genome Biol ; 25(1): 97, 2024 Apr 15.
Artículo en Inglés | MEDLINE | ID: mdl-38622738

RESUMEN

BACKGROUND: As most viruses remain uncultivated, metagenomics is currently the main method for virus discovery. Detecting viruses in metagenomic data is not trivial. In the past few years, many bioinformatic virus identification tools have been developed for this task, making it challenging to choose the right tools, parameters, and cutoffs. As all these tools measure different biological signals, and use different algorithms and training and reference databases, it is imperative to conduct an independent benchmarking to give users objective guidance. RESULTS: We compare the performance of nine state-of-the-art virus identification tools in thirteen modes on eight paired viral and microbial datasets from three distinct biomes, including a new complex dataset from Antarctic coastal waters. The tools have highly variable true positive rates (0-97%) and false positive rates (0-30%). PPR-Meta best distinguishes viral from microbial contigs, followed by DeepVirFinder, VirSorter2, and VIBRANT. Different tools identify different subsets of the benchmarking data and all tools, except for Sourmash, find unique viral contigs. Performance of tools improved with adjusted parameter cutoffs, indicating that adjustment of parameter cutoffs before usage should be considered. CONCLUSIONS: Together, our independent benchmarking facilitates selecting choices of bioinformatic virus identification tools and gives suggestions for parameter adjustments to viromics researchers.


Asunto(s)
Benchmarking , Virus , Metagenoma , Ecosistema , Metagenómica/métodos , Biología Computacional/métodos , Bases de Datos Genéticas , Virus/genética
13.
Sci Rep ; 14(1): 9040, 2024 04 19.
Artículo en Inglés | MEDLINE | ID: mdl-38641637

RESUMEN

Immune thrombocytopenia (ITP), an acquired autoimmune disease, is characterized by immune-mediated platelet destruction. A biomarker is a biological entity that contributes to disease pathogenesis and reflects disease activity. Metabolic alterations are reported to be associated with the occurrence of various diseases. As metabolic biomarkers for ITP have not been identified. This study aimed to identify metabolism-related differentially expressed genes as potential biomarkers for pathogenesis of ITP using bioinformatic analyses.The microarray expression data of the peripheral blood mononuclear cells were downloaded from the Gene Expression Omnibus database (GSE112278 download link: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE112278 ). Key module genes were intersected with metabolism-related genes to obtain the metabolism-related key candidate genes. The hub genes were screened based on the degree function in the coytoscape sofware. The key ITP-related genes were subjected to functional enrichment analysis. Immune infiltration analysis was performed using a single-sample gene set enrichment analysis algorithm to evaluate the differential infiltration levels of immune cell types between ITP patient and control. Molecular subtypes were identified based on the expression of hub genes. The expression of hub genes in the ITP patients was validated using quantitative real-time polymerase chain reaction analysis. This study identified five hub genes (ADH4, CYP7A1, CYP1A2, CYP8B1, and NR1H4), which were be associated with the pathogenesis of ITP, and two molecular subtypes of ITP. Among these hub genes, CYP7A1 and CYP8B1 involved in cholesterol metabolism,were further verified in clinical samples.


Asunto(s)
Púrpura Trombocitopénica Idiopática , Trombocitopenia , Humanos , Púrpura Trombocitopénica Idiopática/genética , Leucocitos Mononucleares , Esteroide 12-alfa-Hidroxilasa , Biomarcadores , Biología Computacional
14.
Genome Biol ; 25(1): 101, 2024 Apr 19.
Artículo en Inglés | MEDLINE | ID: mdl-38641647

RESUMEN

Many bioinformatics methods seek to reduce reference bias, but no methods exist to comprehensively measure it. Biastools analyzes and categorizes instances of reference bias. It works in various scenarios: when the donor's variants are known and reads are simulated; when donor variants are known and reads are real; and when variants are unknown and reads are real. Using biastools, we observe that more inclusive graph genomes result in fewer biased sites. We find that end-to-end alignment reduces bias at indels relative to local aligners. Finally, we use biastools to characterize how T2T references improve large-scale bias.


Asunto(s)
Genoma , Genómica , Genómica/métodos , Biología Computacional , Mutación INDEL , Sesgo , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos
15.
BMC Bioinformatics ; 25(1): 157, 2024 Apr 20.
Artículo en Inglés | MEDLINE | ID: mdl-38643108

RESUMEN

BACKGROUND: The identification of essential proteins can help in understanding the minimum requirements for cell survival and development to discover drug targets and prevent disease. Nowadays, node ranking methods are a common way to identify essential proteins, but the poor data quality of the underlying PIN has somewhat hindered the identification accuracy of essential proteins for these methods in the PIN. Therefore, researchers constructed refinement networks by considering certain biological properties of interacting protein pairs to improve the performance of node ranking methods in the PIN. Studies show that proteins in a complex are more likely to be essential than proteins not present in the complex. However, the modularity is usually ignored for the refinement methods of the PINs. METHODS: Based on this, we proposed a network refinement method based on module discovery and biological information. The idea is, first, to extract the maximal connected subgraph in the PIN, and to divide it into different modules by using Fast-unfolding algorithm; then, to detect critical modules according to the orthologous information, subcellular localization information and topology information within each module; finally, to construct a more refined network (CM-PIN) by using the identified critical modules. RESULTS: To evaluate the effectiveness of the proposed method, we used 12 typical node ranking methods (LAC, DC, DMNC, NC, TP, LID, CC, BC, PR, LR, PeC, WDC) to compare the overall performance of the CM-PIN with those on the S-PIN, D-PIN and RD-PIN. The experimental results showed that the CM-PIN was optimal in terms of the identification number of essential proteins, precision-recall curve, Jackknifing method and other criteria, and can help to identify essential proteins more accurately.


Asunto(s)
Proteínas de Saccharomyces cerevisiae , Saccharomyces cerevisiae , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo , Mapeo de Interacción de Proteínas/métodos , Algoritmos , Mapas de Interacción de Proteínas , Biología Computacional/métodos
16.
PLoS One ; 19(4): e0301995, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38635539

RESUMEN

Breast cancer (BC) is the most common cancer among women with high morbidity and mortality. Therefore, new research is still needed for biomarker detection. GSE101124 and GSE182471 datasets were obtained from the Gene Expression Omnibus (GEO) database to evaluate differentially expressed circular RNAs (circRNAs). The Cancer Genome Atlas (TCGA) and Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) databases were used to identify the significantly dysregulated microRNAs (miRNAs) and genes considering the Prediction Analysis of Microarray classification (PAM50). The circRNA-miRNA-mRNA relationship was investigated using the Cancer-Specific CircRNA, miRDB, miRTarBase, and miRWalk databases. The circRNA-miRNA-mRNA regulatory network was annotated using Gene Ontology (GO) analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database. The protein-protein interaction network was constructed by the STRING database and visualized by the Cytoscape tool. Then, raw miRNA data and genes were filtered using some selection criteria according to a specific expression level in PAM50 subgroups. A bottleneck method was utilized to obtain highly interacted hub genes using cytoHubba Cytoscape plugin. The Disease-Free Survival and Overall Survival analysis were performed for these hub genes, which are detected within the miRNA and circRNA axis in our study. We identified three circRNAs, three miRNAs, and eighteen candidate target genes that may play an important role in BC. In addition, it has been determined that these molecules can be useful in the classification of BC, especially in determining the basal-like breast cancer (BLBC) subtype. We conclude that hsa_circ_0000515/miR-486-5p/SDC1 axis may be an important biomarker candidate in distinguishing patients in the BLBC subgroup of BC.


Asunto(s)
Neoplasias de la Mama , MicroARNs , Humanos , Femenino , ARN Circular/genética , Neoplasias de la Mama/genética , MicroARNs/genética , Biología Computacional , Biomarcadores , Redes Reguladoras de Genes
17.
PLoS One ; 19(4): e0300350, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38635808

RESUMEN

Monogenic diabetes is characterized as a group of diseases caused by rare variants in single genes. Like for other rare diseases, multiple genes have been linked to monogenic diabetes with different measures of pathogenicity, but the information on the genes and variants is not unified among different resources, making it challenging to process them informatically. We have developed an automated pipeline for collecting and harmonizing data on genetic variants linked to monogenic diabetes. Furthermore, we have translated variant genetic sequences into protein sequences accounting for all protein isoforms and their variants. This allows researchers to consolidate information on variant genes and proteins linked to monogenic diabetes and facilitates their study using proteomics or structural biology. Our open and flexible implementation using Jupyter notebooks enables tailoring and modifying the pipeline and its application to other rare diseases.


Asunto(s)
Diabetes Mellitus , Proteómica , Humanos , Enfermedades Raras/genética , Genómica , Biología Computacional , Diabetes Mellitus/genética
18.
PLoS Comput Biol ; 20(4): e1011945, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38578805

RESUMEN

Early identification of safe and efficacious disease targets is crucial to alleviating the tremendous cost of drug discovery projects. However, existing experimental methods for identifying new targets are generally labor-intensive and failure-prone. On the other hand, computational approaches, especially machine learning-based frameworks, have shown remarkable application potential in drug discovery. In this work, we propose Progeni, a novel machine learning-based framework for target identification. In addition to fully exploiting the known heterogeneous biological networks from various sources, Progeni integrates literature evidence about the relations between biological entities to construct a probabilistic knowledge graph. Graph neural networks are then employed in Progeni to learn the feature embeddings of biological entities to facilitate the identification of biologically relevant target candidates. A comprehensive evaluation of Progeni demonstrated its superior predictive power over the baseline methods on the target identification task. In addition, our extensive tests showed that Progeni exhibited high robustness to the negative effect of exposure bias, a common phenomenon in recommendation systems, and effectively identified new targets that can be strongly supported by the literature. Moreover, our wet lab experiments successfully validated the biological significance of the top target candidates predicted by Progeni for melanoma and colorectal cancer. All these results suggested that Progeni can identify biologically effective targets and thus provide a powerful and useful tool for advancing the drug discovery process.


Asunto(s)
Biología Computacional , Descubrimiento de Drogas , Aprendizaje Automático , Redes Neurales de la Computación , Humanos , Biología Computacional/métodos , Descubrimiento de Drogas/métodos , Algoritmos , Melanoma , Probabilidad , Neoplasias Colorrectales
19.
World J Microbiol Biotechnol ; 40(5): 156, 2024 Apr 08.
Artículo en Inglés | MEDLINE | ID: mdl-38587708

RESUMEN

In the post-genome era, great progress has been made in metabolic engineering using recombinant DNA technology to enhance the production of high-value products by Streptomyces. With the development of microbial genome sequencing techniques and bioinformatic tools, a growing number of secondary metabolite (SM) biosynthetic gene clusters in Streptomyces and their biosynthetic logics have been uncovered and elucidated. In order to increase our knowledge about transcriptional regulators in SM of Streptomyces, this review firstly makes a comprehensive summary of the characterized factors involved in enhancing SM production and awakening SM biosynthesis. Future perspectives on transcriptional regulator engineering for new SM biosynthesis by Streptomyces are also provided.


Asunto(s)
Streptomyces , Streptomyces/genética , Metabolismo Secundario/genética , Mapeo Cromosómico , Biología Computacional , Ingeniería Metabólica
20.
Bioinformatics ; 40(4)2024 Mar 29.
Artículo en Inglés | MEDLINE | ID: mdl-38588573

RESUMEN

SUMMARY: Recent technical advancements in single-cell chromatin accessibility sequencing (scCAS) have brought new insights to the characterization of epigenetic heterogeneity. As single-cell genomics experiments scale up to hundreds of thousands of cells, the demand for computational resources for downstream analysis grows intractably large and exceeds the capabilities of most researchers. Here, we propose EpiCarousel, a tailored Python package based on lazy loading, parallel processing, and community detection for memory- and time-efficient identification of metacells, i.e. the emergence of homogenous cells, in large-scale scCAS data. Through comprehensive experiments on five datasets of various protocols, sample sizes, dimensions, number of cell types, and degrees of cell-type imbalance, EpiCarousel outperformed baseline methods in systematic evaluation of memory usage, computational time, and multiple downstream analyses including cell type identification. Moreover, EpiCarousel executes preprocessing and downstream cell clustering on the atlas-level dataset with 707 043 cells and 1 154 611 peaks within 2 h consuming <75 GB of RAM and provides superior performance for characterizing cell heterogeneity than state-of-the-art methods. AVAILABILITY AND IMPLEMENTATION: The EpiCarousel software is well-documented and freely available at https://github.com/biox-nku/epicarousel. It can be seamlessly interoperated with extensive scCAS analysis toolkits.


Asunto(s)
Cromatina , Análisis de la Célula Individual , Programas Informáticos , Cromatina/metabolismo , Análisis de la Célula Individual/métodos , Humanos , Genómica/métodos , Biología Computacional/métodos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...